Skip to content

PERF: Categorical getitem perf #30747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TomAugspurger
Copy link
Contributor

Convert to an array earlier on.
Closes #30744

Convert to an array earlier on.
Closes pandas-dev#30744
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jan 6, 2020

Master

   ...: %timeit data[list_]
1.44 ms ± 5.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This PR

   ...: %timeit data[list_]
740 µs ± 5.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Will post the pre-indexing with NA timing in a bit.

Edit:

7b35099 (1 commit prior to NA indexing)

728 µs ± 7.57 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

so we're back to where we were.

@jreback
Copy link
Contributor

jreback commented Jan 6, 2020

cool, can you add an asv

@jreback jreback added Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 6, 2020
@TomAugspurger
Copy link
Contributor Author

can you add an asv

I think Joris caught this because the existing asv CategoricalSlicing.time_getitem_list https://pandas.pydata.org/speed/pandas/#categoricals.CategoricalSlicing.time_getitem_list?commits=6efc2379-b9de33e3

@jbrockmendel
Copy link
Member

LGTM

@TomAugspurger TomAugspurger added this to the 1.0 milestone Jan 6, 2020
@TomAugspurger TomAugspurger merged commit d3f94a4 into pandas-dev:master Jan 6, 2020
@TomAugspurger TomAugspurger deleted the categorical-getitem-perf branch January 6, 2020 19:28
@jorisvandenbossche
Copy link
Member

So I reported this for Categorical (indeed because of the existing benchmark), but a similar change was made for other dtypes in #30308. Also the boolean and integer (and numpy) arrays have the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Categorical indexing performance regression
4 participants